How do full-time and part-time employment patterns differ by sex within each province?
Do certain provinces show local economic weaknesses, and if so, what are the drivers?
If we find evidence of confounding variables, what role might public policy, dominant industries, the distribution of workers by sex across industries, or broader issues of gender equity in Canadian workplaces and culture play?
Can demographic differences help explain variation in employment rates across provinces?
Through these questions, we aimed to uncover patterns that link demographics to employment rates in Canada. The analysis in our paper highlighted broader findings about long-term trends and seasonality in employment, as well as significant differences in patterns between provinces and across sexes over time. The code below addresses some, but of course not all, of our questions.
Example: Ontario
## month variable sex Ontario
## 1 1976-01 Employment Both sexes 3707.4
## 2 1976-01 Employment Females 1426.7
## 3 1976-01 Employment Males 2280.7
## 4 1976-01 Full-time employment Both sexes 3209.1
## 5 1976-01 Full-time employment Females 1084.1
## 6 1976-01 Full-time employment Males 2124.9
What proportion of part-time employees in Ontario are female?
What proportion of all Ontario female workers are part-time?
What proportion of all workers in Ontario are both female and part-time?
Summary statistics
Plots
## [1] 1.152995
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6140 0.6695 0.6867 0.6848 0.7017 0.7307
## [1] 1.350914
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2335 0.2584 0.2668 0.2664 0.2741 0.2981
## [1] 0.8670492
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09144 0.11717 0.12285 0.12113 0.12795 0.13643
What proportion of full-time employees in Ontario are female?
What proportion of all Ontario female workers are full-time?
What proportion of all workers in Ontario are both female and full-time?
Summary statistics
Plots
## [1] 0.1252106
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3325 0.3809 0.4092 0.4059 0.4352 0.4480
## [1] 1.348098
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7018 0.7259 0.7332 0.7336 0.7416 0.7665
## [1] 0.3362953
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2861 0.3180 0.3343 0.3335 0.3521 0.3640
Trend strength
Seasonality strength
Plots
## [1] 0.8959216
## [1] 0.01131532
## Jun Jul Aug Sep Oct Nov
## 2019 0.6348039 0.6139722 0.6253569 0.6381162 0.6507038 0.6363181
## ** Teraesvirta's neural network test **
## Null hypothesis: Linearity in "mean"
## X-squared = 6.666735 df = 2 p-value = 0.03567277
##
## ** White neural network test **
## Null hypothesis: Linearity in "mean"
## X-squared = 10.93809 df = 2 p-value = 0.004215253
##
## ** Keenan's one-degree test for nonlinearity **
## Null hypothesis: The time series follows some AR process
## F-stat = 0.003973227 p-value = 0.9497725
##
## ** McLeod-Li test **
## Null hypothesis: The time series follows some ARIMA process
## Maximum p-value = 0
##
## ** Tsay's Test for nonlinearity **
## Null hypothesis: The time series follows some AR process
## F-stat = 0.7983433 p-value = 0.9278586
##
## ** Likelihood ratio test for threshold nonlinearity **
## Null hypothesis: The time series follows some AR process
## Alternativce hypothesis: The time series follows some TAR process
## X-squared = 29.481 p-value = 0.2106931
## $Terasvirta
##
## Teraesvirta Neural Network Test
##
## data: ts(time.series)
## X-squared = 6.6667, df = 2, p-value = 0.03567
##
##
## $White
##
## White Neural Network Test
##
## data: ts(time.series)
## X-squared = 10.938, df = 2, p-value = 0.004215
##
##
## $Keenan
## $Keenan$test.stat
## [1] 0.003973227
##
## $Keenan$df1
## [1] 1
##
## $Keenan$df2
## [1] 385
##
## $Keenan$p.value
## [1] 0.9497725
##
## $Keenan$order
## [1] 16
##
##
## $McLeodLi
## $McLeodLi$p.values
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
##
## $Tsay
## $Tsay$test.stat
## [1] 0.7983433
##
## $Tsay$p.value
## [1] 0.9278586
##
## $Tsay$order
## [1] 16
##
##
## $TarTest
## $TarTest$percentiles
## [1] 24.8139 75.1861
##
## $TarTest$test.statistic
## [1] 29.481
##
## $TarTest$p.value
## [1] 0.2106931
## ETS(M,N,N)
##
## Call:
## ets(y = prop_PT.f_ontario.train)
##
## Smoothing parameters:
## alpha = 0.6392
##
## Initial states:
## l = 0.693
##
## sigma: 0.0112
##
## AIC AICc BIC
## -1539.180 -1539.122 -1527.066
## [1] -0.9999242
## ETS(A,N,N)
##
## Call:
## ets(y = prop_PT.f_ontario.train, lambda = 1.106443)
##
## Box-Cox transformation: lambda= 1.1064
##
## Smoothing parameters:
## alpha = 0.6393
##
## Initial states:
## l = -0.3015
##
## sigma: 0.0075
##
## AIC AICc BIC
## -1569.636 -1569.578 -1557.523
## Series: prop_PT.f_ontario.train
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 0.6930
## s.e. 0.0008
##
## sigma^2 = 0.0002581: log likelihood = 1136.86
## AIC=-2269.73 AICc=-2269.7 BIC=-2261.65
## Series: prop_PT.f_ontario.train
## Model: NNAR(26,1,14)[12]
## Call: nnetar(y = prop_PT.f_ontario.train)
##
## Average of 20 networks, each of which is
## a 26-14-1 network with 393 weights
## options were - linear output units
##
## sigma^2 estimated as 1.69e-07
## [,1] [,2] [,3]
## [1,] "model" "train MASE" "test MASE"
## [2,] "naive" "0.5325" "1.6587"
## [3,] "randomwalk" "0.4915" "1.4284"
## [4,] "best ETS" "0.5028" "1.2835"
## [5,] "BoxCox ETS" "0.5028" "1.2836"
## [6,] "log-trans ETS" "0.5029" "1.2828"
## [7,] "best ARIMA" "1.0934" "2.5262"
## [8,] "best neural" "0.0241" "0.6182"
## [9,] "bagged" "0.4675" "1.0246"
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 60.342, df = 24, p-value = 5.715e-05
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 60.1, df = 24, p-value = 6.184e-05
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 60.08, df = 24, p-value = 6.224e-05
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 4186.3, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 88.938, df = 24, p-value = 2.16e-09
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from NNAR(26,1,14)[12]
## Q* = 17.676, df = 24, p-value = 0.8184
##
## Model df: 0. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from baggedModel
## Q* = 132.73, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
As described on p. 17 of the final paper, the bagged model is preferred because bootstrap aggregation adapts to the data, offers greater flexibility, and avoids imposing a decreasing trend. Relative to the neural net, the bagged forecasts indicate that the series level—the proportion of Ontario part-time workers who are female—is unlikely to change substantially in the coming decade (≈26–27%). Given the high variability in the granular series (also evident in the attempted moving-average decomposition), bagging represents the most conservative choice.
To support transparency and replication for this Github project, the code was restructured and annotated so others can follow the workflow and reproduce the results. This process underscored the importance of clarity in both modeling and communication, ensuring that the conclusions are not only accurate but also accessible for others to test, adapt, and extend (see p. 18 for conclusions and future suggestions).